Deterministic Heavy Hitters with Sublinear Query Time

نویسندگان

  • Yi Li
  • Vasileios Nakos
چکیده

This paper studies the classic problem of finding heavy hitters in the turnstile streaming model. We give the first deterministic linear sketch that has O( −2 log n · log∗( −1)) rows and answers queries in sublinear time. The number of rows is only a factor of log∗( −1) more than that used by the state-of-the-art algorithm prior to our paper due to Nelson, Nguyen and Woodruff (RANDOM’12). Their algorithm runs in time at least linear in the universe size n, which is highly undesirable in streaming applications. Our approach is based on an iterative procedure, where most unrecovered heavy hitters are identified in each iteration. Although this technique has been extensively employed in the related problem of sparse recovery, this is the first time, to the best of our knowledge, that it has been used in the context of `1 heavy hitters. Along the way, we also give sublinear time algorithms for the closely related problems of combinatorial group testing and `1/`1 compressed sensing, matching the space usage of previous (super-)linear time algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Low-Risk Heavy Hitters and Sparse Recovery Schemes

We study the heavy hitters and related sparse recovery problems in the low-failure probability regime. This regime is not well-understood, and has only been studied for non-adaptive schemes. The main previous work is on sparse recovery by Gilbert et al. (ICALP’13). We recognize an error in their analysis, improve their results, and contribute new non-adaptive and adaptive sparse recovery algori...

متن کامل

Reducing the Randomness Complexity of Property Testing, with an Emphasis on Testing Bipartiteness

Property testers are algorithms whose goal is distinguishing between inputs that have a certain property and inputs which are far from all instances with this property. We show that for a wide variety of properties, there exists no deterministic tester that queries only a sublinear number of input entries. Therefore, most sublinear property testers must be probabilistic algorithms. Nevertheless...

متن کامل

ar X iv : c s / 06 09 03 2 v 1 [ cs . D S ] 7 S ep 2 00 6 CR - precis : A deterministic summary structure for update data streams

We present the CR-precis structure, that is a general-purpose, deterministic and sub-linear data structure for summarizing update data streams. The CR-precis structure yields the first deterministic sub-linear space/time algorithms for update streams for answering a variety of fundamental stream queries, such as, (a) point queries, (b) range queries, (c) finding approximate frequent items, (d) ...

متن کامل

cs . D S ] 1 7 Se p 20 06 CR - precis : A deterministic summary structure for update data streams

We present the CR-precis structure, that is a general-purpose, deterministic and sub-linear data structure for summarizing update data streams. The CR-precis structure yields the first deterministic sub-linear space/time algorithms for answering a variety of fundamental queries over update streams, such as, (a) point queries, (b) range queries, (c) finding approximate frequent items, (d) findin...

متن کامل

High Throughput Heavy Hitter Aggregation

Heavy hitters are data items that occur at high frequency in a data set. Heavy hitters are among the most important items for an organization to summarize and understand during analytical processing. In data sets with sufficient skew, the number of heavy hitters can be relatively small. We take advantage of this small footprint to compute aggregate functions for the heavy hitters in fast cache ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1712.01971  شماره 

صفحات  -

تاریخ انتشار 2017